Transcriptome analysis and exploration of genes involved in the biosynthesis of secoiridoids in Gentiana rhodantha

Gentiana rhodantha is a medicinally important perennial herb used as traditional Chinese and ethnic medicines. Secoiridoids are one of the major bioactive compounds in G. rhodantha. To better understand the secoiridoid biosynthesis pathway, we generated transcriptome sequences from four organs (root, leaf, stem and flower), followed by the de novo sequence assembly. We verified 8-HGO (8-hydroxygeraniol oxidoreductase), which may encode key enzymes of the secoiridoid biosynthesis by qRT-PCR. The mangiferin, swertiamarin and loganic acid contents in root, stem, leaf, and flower were determined by HPLC. The results showed that there were 47,871 unigenes with an average length of 1,107.38 bp. Among them, 1,422 unigenes were involved in 25 standard secondary metabolism-related pathways in the KEGG database. Furthermore, we found that 1,005 unigenes can be divided into 66 transcription factor (TF) families, with no family members exhibiting significant organ-specificity. There were 54 unigenes in G. rhodantha that encoded 17 key enzymes of the secoiridoid biosynthetic pathway. The qRT-PCR of the 8-HGO and HPLC results showed that the relative expression and the mangiferin, swertiamarin, and loganic acid contents of the aerial parts were higher than in the root. Six types of SSR were identified by SSR analysis of unigenes: mono-nucleoside repeat SSR, di-nucleoside repeat SSR, tri-nucleoside repeat SSR, tetra-nucleoside repeat SSR, penta-nucleoside repeat SSR, and hexa-nucleoside repeat SSR. This report not only enriches the Gentiana transcriptome database but helps further study the function and regulation of active component biosynthesis of G. rhodantha.


INTRODUCTION
The herbaceous Gentiana genus (family Gentianaceae) comprises about 500 species worldwide, and is widely distributed in the temperate and tropical alpine regions of the northern hemisphere including Europe, Asia, northern Australia, New Zealand, North America, reaches Cape Horn along the Andes and northern Africa. There are approximately 247 species in China, which are mainly distributed in the southwest mountainous area. Gentiana has multiple pharmacological effects, including hepatoprotective, antiinflammatory, antipyretic, etc (Editorial Committee of Chinese Flora, 1988;Dong et al., platform, and then deciphered all the candidate genes and putative transcription factors involved in the mangiferin and secoiridoid biosynthetic pathway. Although mangiferin is one of the main active components of G. rhodantha, the mangiferin biosynthetic pathway is unavailable in the KEGG official website to date. Therefore, this study focused on the analysis and identification of the putative genes involved in secoiridoid biosynthesis, aiming to provide useful insights into their further quality regulation. Additionally, numerous simple sequence repeats (SSR) markers were found, which will facilitate the marker-assisted breeding of G. rhodantha.

MATERIALS AND METHODS
G. rhodantha used in this experiment, was collected from the Yiduoyun village, Kunming, Yunnan, China (25 • 00 5.62 N, 102 • 58 46.35 E, 1,068 m). The taxonomic identities of the voucher specimens were identified by the corresponding author. Natural wild G. rhodantha was sampled at the full bloom stage. Fresh root, stem, leaf, and flower were collected from three plants having relatively consistent growth and synchronized development (Figs. 1A, 1B). One half was quickly frozen in liquid nitrogen and stored at −80 • C, while the other half was dried to constant weight at 55 • C and used for determining the mangiferin, swertiamarin, and loganic acid contents.

RNA extraction and cDNA library preparation and RNA-Seq analysis
RNA was extracted from the root, leaf, stem, and flower. First, 1 µg RNA per sample was used for the RNA sample preparations. Sequencing libraries were generated using the NEBNext R Ultra TM RNA Library Prep Kit for Illumina R (New England Biolabs, Ipswich, MA, USA) following the manufacturer's recommendations and index codes were added to attribute sequences to each sample. Twelve libraries were obtained from three biological replicates per organ. Illumina Novaseq 6,000 sequencing was performed after the library was qualified. The average sequencing depth (count* 150/gene_len) was approximately 456.

De novo assembly and sequence processing
First, the raw data (raw reads) of the fastq format were processed through the inhouse Perl scripts. Filter joint, low quality, sequences with N bases, low quality bases (Q <20) were removed to get high-quality clean data (Bolger, Marc & Bjoern, 2014). Firstly, Breaking sequencing reads into short segments (K-mer) via TRINITY (https: //github.com/trinityrnaseq/trinityrnaseq/wiki) under the parameter of (-min_contig_length 200, -group_pairs_distance 500). These small fragments were then extended into longer segments (contigs). Overlaps between these fragments were used to get a collection of fragments (component). Finally, the De Brujin graph method and sequencing read information were used to identify transcript sequences in each fragment collection. The RSeQC (RNA-seq data QC) software was used to remove the redundant sequences in the transcript to obtain the unigene (Haas, Papanicolaou & Yassour, 2013 Note: In the formula, cDNA fragments indicates the number of fragments as compared to a certain transcript, the number of double-ended reads; mapped fragments (millions) indicates the total number of fragments as compared to a transcript (in 10 6 units); transcript length (kb): transcript length (in 10 3 bases).

Screening for secoiridoid biosynthesis genes
By referring the relevant secoiridoid biosynthesis metabolism pathways (Wu & Liu, 2017;Yang, Fang & Li, 2018), the results of nine database annotations were combined. The secoiridoid biosynthesis-related unigenes in the G. rhodantha transcription data were uncovered. The direct embodiment of a gene expression level is the abundance of its transcript. The higher the transcript abundance, the higher was the gene expression level. After referring the secoiridoid biosynthesis pathway in G. scabra, G. rigescens, and G. macrophylla, a possible biosynthetic pathway of G. rhodantha was speculated. Combined with the annotation results in the Nr and KEGG databases, the secoiridoid biosynthesisrelated unigene in the transcriptome data was mined, and the expression amount was calculated using the TPM (transcripts per million) value.

Differential expression analysis
DESeq2 (v1.6.3) (Love, Huber & Anders, 2014) was used for differential expression analysis between samples to obtain the differential gene expression set of the two conditions. In the process of differential expression analysis, the Benjamini-Hochberg method was used to correct the significance (p-value) obtained by the original hypothesis test. Finally, the corrected p-value and False Discovery Rate (FDR) were adopted as the key indicator of differential expression gene screening. During the screening process, FDR <0.01 and FC |(fold change)| ≥2 were used as the screening criteria.

qRT-PCR analysis
The 8-HGO was screened from the secoiridoid biosynthesis pathway using the screening criteria of | log2 (FC) |≥ 2 and FDR <0.01. The 18S rRNA is present in the ribosomal subunit, and its encoding gene rDNA (18S rRNA/rDNA) is evolutionarily conserved. The relative expression of 8-HGO from four organs was verified with 18S rRNA as the internal reference gene. qRT-PCR was performed using the Analytik Jena-qTOWER2.2 (Analytik Jena, Jena, Germany) with TUREscript 1st Stand cDNA SYNTHESIS Kit (Aidlab, Hong Kong). Gene-specific primers were designed using Primer Premier 5.0, and the primer sequences are listed in the ( Table 1). The relative gene expression was calculated by the 2 − Ct method (Pfaffl, 2001).

Illumina sequencing and read assembly
We obtained a total of 12 libraries from the four organs, including high-quality reads fragments from 20, 423,964, 21,194,626 and 21,917,414 of root, 21,475,005, 20, 386,021, and 19,271,209 of stem, 20,873,444, 21,275,398, and 20,960,228 of leaf and 21,230,658, 21,811,288, and 22,615,547 of flower. After sequence assembly, a total of 47,871 unigenes were obtained. The length of N50 was 1,826 bp, with an average length of 1,107.38 bp (Table  2). Pearson's correlation coefficient (r) was used as an indicator of studying inter-sample correlation (Schulze, Kanwar & Gölzenleuchter, 2012). The closer r 2 is to 1, the stronger was the correlation between the two samples ( Fig. 2B). The figure shows that the correlation of three repeats within both the stem and root was >0.8, which indicated high reproducibility. However, the correlation of three repeats within both leaf and flower was not good. PCA ( Fig. 2A) analysis showed similar results as the heat maps. A BUSCO analysis was performed to evaluate the completeness, and we recovered 1,292 of the 1,614 conserved eukaryotic genes (80%) (Fig. 1C).
We further enriched these TF families into KEGG metabolic pathways. Two unigenes encoding the C2H2 family of TFs were enriched in tropane, piperidine, and pyridine alkaloid biosynthesis, whereas six unigenes encoding the zf-HD TF family members were enriched in betalain biosynthesis. Furthermore, three unigenes encoding the trihelix TF family members were enriched in ubiquinone and other terpenoid quinone biosynthesis. Finally, three unigenes encoding C3H TFs family and one encoding bHLH TFs family were enriched in the phylalanine, tyrosine, and tryptophan biosynthetic pathways (Table 4).

Analysis of KEGG pathways
The KEGG pathways in G. rhodantha can be divided into five categories: cellular processes, environmental information processing, genetic information processing, metabolism, and organic systems, which we mapped to 136 KEGG pathways. The top three metabolic pathways were carbon (763, 6.95%), amino acids (510, 4.65%), and glycolysis/gluconeogenesis (432, 3.94%), with the rest being mainly enriched in pentose and gluconate interconversion along with tarch and sucrose metabolism. The secondary metabolites contained in higher plants are closely related to their medicinal ingredients. In this regard, we assigned 1,422 unigenes to the 25 secondary metabolic pathways in G. rhodantha. Among them, 172 unigenes encoded key enzymes involved in the terpenoid biosynthesis pathway, including terpenoid backbone (73  (Table 5).

Differential gene expression analysis
Since gene expression is spatiotemporal specific, we pairwise compared the transcriptome data from four different organs. In the process of DEGs analysis, we used the Benjamin-Hochberg method to correct the significant row p-value obtained from the original hypothesis test. In the screening process, FDR <0.01 and FC (fold change) ≥ 2 were used as the screening criteria. G. rhodantha is a perennial herbaceous grass used as medicines.
Considering their roots are perennial and its aerial parts (stem, leaf, and flower) are annual, we chose root as the control group. When we compared the control group (root) with the experimental group (stem, leaf, and flower), we found significant transcript differences. The number of differential expressions was the highest between the root and flower, whereas it was the lowest between the root and leaves (Fig. 7A). A total of 4,606 transcripts showed organ-specific expression, of which 966, 2,191, 274, and 1,175 transcripts were from the root, stem, leaf, and flower, respectively (Fig. 7B).
The annotation and analysis of metabolic pathways of differentially expressed genes (DEGs) is helpful for further understanding the function of genes. When regarding root as the control group and the stem as the experimental group, 6,156 DEGs were enriched in 129 metabolic pathways, including benzoxazinoid biosynthesis (13), flavoid biosynthesis (39), stilbenoid, dialylheptanoid and ginger biosynthesis (19), circadian rhythm-plant (41), and photosynthesis-antenna proteins (24). When regarding root as the control group and leaf as  (17), and flavor biosynthesis (22). Finally, when regarding root as the control group and the flower as the experimental group, 6,649 DEGs were enriched in 131 metabolic pathways, which were significantly enriched in cutin, suberine, and wax biosynthesis (22), zeatin biosynthesis (20), cyanoamino acid metabolism (33), flavor biosynthesis (27), and plant hormone signal transduction (165) ( Table 6). These DEGs were significantly enriched in flavonoid biosynthesis, which showed that the flavonoid accumulation was more abundant.

Gene expression analysis of unigenes associated with mangiferin, swertiamarin, and loganic acid biosynthetic pathway
Although mangiferin is one of the main active components of G. rhodantha, its biosynthesis has not been included in the KEGG database to date. Therefore, we focused on analysis and identification of the putative genes in the secoiridoid biosynthesis pathway. The secoiridoid biosynthesis is probably completed using the following steps: intermediate generation, terpene skeleton synthesis, and post-modification (Wu & Liu, 2017;Kang et al., 2021). The analysis results showed that 54 unigenes in the G. rhodantha transcriptome encoding 17 key enzymes in different organs (Table 7) and related genes, which is represented by the heat map.
A wide variety of terpenoids with diverse structures are synthesized from common precursors isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP). These compounds can be derived from both the mevalonic acid (MVA) pathway in the cytoplasm and the 2-C-methyl-D-erythritol-4-phosphate (MEP) pathway in plastids (Singh, Gahlan & Kumar, 2012). IPP and DMAPP are catalyzed by geranyl pyrophosphate synthase (GPPS) to geranyl diphosphate (GPP), which is an important cut-off point. The GPP flow through different metabolic directions to monoterpenes, diterpenes, triterpenes, etc.
For the synthesis of secoiridoid, geraniol is the starting compound. There may be three pathways from GPP to geraniol (Fig. 8A). The pathway involving the transformation of GPP into geraniol and pyrophosphate by the action of geraniol synthase (GES) was fully annotated, while the other two pathways were not fully annotated. Geraniol was converted into iridodial, which was the skeleton of secoiridoids under the excessive step reaction (Miettinen et al., 2014;Lichman et al., 2019). Iridodial is converted into secoiridoid glycoside compounds under a series of modification processes involving the addition of sugar groups, deoxygenation, ring opening, etc (Liu et al., 2017;Rain & Takahashi, 2016). Based on the statistics of TPM value, we searched for the TPM values of the genes in different organs and then used TBtools(v1.098) to plot the heatmap. For MVA pathway, the AACT2 expression is relatively high in the root, stem, and flower, whereas that of HMGS1 is relatively high in root and flower (Fig. 8B). For the MEP pathway, the relative expression of DXR was higher in the leaf and flower, while that of HDR was higher in the aerial parts (Fig. 8C). In secoiridoid pathway, the relative expression of 8-HGO was also higher in the aerial part (Fig. 8D).
In the gentiopicroside biosynthesis pathway, 8-HGO is particularly important as its structural gene (Wang, 2020) . 8-HGO is the key enzyme behind the secoiridoid skeleton construction. Additionally, we verified the relative expression of 8-HGO from the four organs using qRT-PCR. These results showed a similar expression pattern with that of the transcriptome. The relative expression of the 8-HGO gene was higher in the aerial parts than in the root (Fig. 9). Therefore, this suggested that the secoiridoid component was mainly synthesized in aerial part, especially in the leaves.

Contents of mangiferin, swertiamarin, and loganic acid
To examine the possible relationship between the gene expression and their corresponding metabolites, we determined the content of three bioactive compounds including two secoiridoid pathway metabolites (swertiamarin and loganic acid and mangiferin. After comparing their contents, all three compounds showed similar accumulation patterns in the different organs, with the lowest levels being in the root. Furthermore, swertiamarin was the least abundant in the root and the most abundant in flower (Fig. 10). Therefore, we found that the contents of the three components in the aerial parts were significantly higher than in the root.

DISCUSSION
Secoiridoid is one of the important medicinal components for G. rhodantha (Li et al., 2006;Inao et al., 2004). The analysis of the secoiridoid biosynthesis pathway is very important for    (Kang et al., 2021), G. waltonii (79,455 unigenes obtained, average length 834 bp) (Ni et al., 2019). There were 30,452 unigenes annotated in the Nr database for G. rhodantha, with the results showing the highest similarity to C. arabica from the Rubiaceae family. This may be due to the limited data about Gentianaceae. Based on the results of the DEGs of G. rhodantha, we found more metabolic pathways in the root with respect to flower than in the other contrast pairs (root with respect to stem, root with respect to leaf). The DEGs were mainly enriched in phytohormone signaling, flavonoid biosynthetic pathways, and cyanogenic amino acid metabolism. Higher plants contain diverse secondary metabolites, which are closely related to their medicinal effects. Additionally, mangiferin, swertiamarin, and loganic acid showed similar accumulation patterns in the different organs, with the lowest levels seen in the root. This may be because the accumulation of the extremely bitter tasting swertiamarin in the aerial parts, especially in the flower, could effectively defend against predator aggression. Mangiferin showed the highest accumulation in all the organs, especially in the leaf, which was consistent with the previous studies (Zhang et al., 2007). The synthesis of secondary metabolites is a complex multi-step process (Wu et al., 2022). KEGG analysis showed that 1,422 unigenes were involved in 25 secondary metabolic pathways, including isoquinoline alkaloid biosynthesis (ko00905), dieterpenoid biosynthesis (ko00904), and sesquiterpenoid and triterpenoid biosynthesis (ko00909). In this study, we selected 8-HGO from the secoiridoid biosynthesis pathway for qRT-PCR validation experiments, and the results were consistent with the expected results. The low content of these three components in the root showed a correlation with the relatively low expression of 8-HGO, thereby confirming the presence of 8-HGO in the secoiridoid biosynthetic pathway. This was consistent with previous reports about G. rhodantha and those of other species of Gentianaceae, like Swertia mussotii (Shen et al., 2016;Liu et al., 2017). Based on above research, it can be inferred that this active composition is mainly synthesized in the aerial part, with the medicinal value of the aerial part being higher than the root. Moveover, the biomass of the roots is very small. When used as a medicinal material, we recommend that the medicinal part should be changed to the aerial part for better protection of the resources and the maintenance of sustainable use.
TFs can control gene expression by specifically binding with the cis-regulatory elements in the promoter region of target genes, and play a key regulatory role in the plant growth and development (Latchman, 1997). Currently, hundreds of TF families have been isolated and identified from higher plants, which are closely related to plant stress resistance, and can regulate the expression of genes related to different plant stressors, like drought, high salt, low temperature, and pathogens (Gibbs et al., 2015;Verma, Ravindran & Kumar, 2016). The number of genes encoding different TFs families varies in different plant species, and they often have species-tissue-specific or developmental stage-specific function(s) (Singh, 1998). A total of 1,005 unigenes were involved in encoding 66 TF families for G. rhodantha. The members of the TF families did not exhibit any significant organ-specificity. Among all the TFs, the C2H2 transcription factor family was the most abundant (89, 8.9%). C2H2 zinc finger proteins play roles in the plant response to a diverse stresses, including low temperatures, salt, drought, oxidative stress, excessive light, and silique shattering (Kim et al., 2013;Yue et al., 2016;Jiang et al., 2022). C2H2 may also be important in the synthesis of secondary metabolites involved in the stress resistance of G. rhodantha. It can be considered for the study of transcription factors regulating the quality of G. rhodantha. Furthermore, 141 unigenes were annotated as WRKY family transcription factors, of which 17 showed higher expressions in the leaf than in the root. WRKY may be a good candidate for studying the secoiridoid biosynthesis regulation in G. rigescens (Zhang et al., 2015). The identification of these transcription factors will be helpful to further analyze the molecular mechanism of secoiridoid biosynthesis and lay a foundation for regulating the secoiridoid metabolite accumulation. Therefore, these findings are highly significant as they provide a reference for mining the key genes of the biosynthesis pathway of secondary metabolites.
As far as the reported transcriptome analysis of G. rhodantha is concerned, it is only the initial stage, and more extensive research is necessary. Furthermore, the expression of genes related to secondary metabolites is also affected by many factors, including plant growth and developmental stage and the ecological environment. Since we had only studied the flowering period, therefore, we could not capture all the gene expression-related information.
Simple repeats, also known as short tandem repeats, are 1-6 nt long DNA sequences widely distributed in the eukaryotic genomes. Six nucleotides form repetitive motifs in different orders. Since the motifs are repeated several times, so they have repeatability, polymorphism, richness, and co-dominance (Chen et al., 2012;Shi et al., 2016). We found a total of 7388 SSR loci in the transcriptome of G. rhodantha, including various nucleotide types, thereby indicating that the SSR loci of G. rhodantha was rich and abundant. The number of single-nucleotides repeats of A/T is the largest (4,573), which was consistent with the previous results of Gardenia jasminoide (Liu et al., 2022a;Liu et al., 2022b;Liu et al., 2022c). Unfortunately, the SSR data for other species of the Gentiana genus have not been reported yet. Therefore, our study presents information for the further development of SSR molecular markers and the DNA ID code construction of G. rhodantha.

CONCLUSION
In this study, we obtained the transcriptome data of G. rhodantha by high-throughput sequencing for the first time, and then determined the genes and DEGs involved in the secoiridoid biosynthesis pathway. Using qRT-PCR, we further verified the RNA-seq analysis results for one key enzyme gene related to secoiridiod biosynthesis. Therefore, our findings provide timely clues for a better understanding of the molecular mechanism of secoiridoid biosynthesis in G. rhodantha.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
This work was supported by the National Natural Science Foundation of China (30260110037) and the Joint special program of traditional Chinese medicine in Yunnan Province-General Program (30272110111). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.